Overview

Dataset Statistics

Number of Variables 9
Number of Rows 99441
Missing Cells 4908
Missing Cells (%) 0.5%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 59.7 MB
Average Row Size in Memory 629.4 B
Variable Types
  • Numerical: 1
  • Categorical: 8

Dataset Insights

index is uniformly distributed Uniform
order_delivered_carrier_date has 1783 (1.79%) missing values Missing
order_delivered_customer_date has 2965 (2.98%) missing values Missing
order_id has a high cardinality: 99441 distinct values High Cardinality
customer_id has a high cardinality: 99441 distinct values High Cardinality
order_purchase_timestamp has a high cardinality: 98875 distinct values High Cardinality
order_approved_at has a high cardinality: 90733 distinct values High Cardinality
order_delivered_carrier_date has a high cardinality: 81018 distinct values High Cardinality
order_delivered_customer_date has a high cardinality: 95664 distinct values High Cardinality
order_estimated_delivery_date has a high cardinality: 459 distinct values High Cardinality
order_id has constant length 32 Constant Length
customer_id has constant length 32 Constant Length
order_purchase_timestamp has constant length 19 Constant Length
order_approved_at has constant length 19 Constant Length
order_delivered_carrier_date has constant length 19 Constant Length
order_delivered_customer_date has constant length 19 Constant Length
order_estimated_delivery_date has constant length 19 Constant Length
order_id has all distinct values Unique
customer_id has all distinct values Unique
  • 1
  • 2

Variables


index

numerical

Approximate Distinct Count 99441
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1591056
Mean 49720
Minimum 0
Maximum 99440
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 4972
Q1 24860
Median 49720
Q3 74580
95-th Percentile 94468
Maximum 99440
Range 99440
IQR 49720

Descriptive Statistics

Mean 49720
Standard Deviation 28706.2884
Variance 8.2405e+08
Sum 4.9442e+09
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5774
  • index is not normally distributed (p-value 7.259388078010123e-05)

order_id

categorical

Approximate Distinct Count 99441
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 9645777

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row e481f51cbdc54678b7...
2nd row 53cdb2fc8bc7dce0b6...
3rd row 47770eb9100c2d0c44...
4th row 949d5b44dbf5de918f...
5th row ad21c59c0840e6cb83...

Letter

Count 1193082
Lowercase Letter 1193082
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1989030
  • order_id contains many words: 99441 words
  • order_id has words of constant length

customer_id

categorical

Approximate Distinct Count 99441
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 9645777

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 9ef432eb6251297304...
2nd row b0830fb4747a6c6d20...
3rd row 41ce2a54c0b03bf344...
4th row f88197465ea7920adc...
5th row 8ab97904e6daea8866...

Letter

Count 1193579
Lowercase Letter 1193579
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1988533
  • customer_id contains many words: 99441 words
  • customer_id has words of constant length

order_status

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7356988
  • The largest value (delivered) is over 87.15 times larger than the second largest value (shipped)

Length

Mean 8.9834
Standard Deviation 0.2854
Median 9
Minimum 7
Maximum 11

Sample

1st row delivered
2nd row delivered
3rd row delivered
4th row delivered
5th row delivered

Letter

Count 893323
Lowercase Letter 893323
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (delivered, shipped) take over 50.0%
  • The largest value (delivered) is over 87.15 times larger than the second largest value (shipped)

order_purchase_timestamp

categorical

Approximate Distinct Count 98875
Approximate Unique (%) 99.4%
Missing 0
Missing (%) 0.0%
Memory Size 8353044

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2017-10-02 10:56:3...
2nd row 2018-07-24 20:41:3...
3rd row 2018-08-08 08:38:4...
4th row 2017-11-18 19:28:0...
5th row 2018-02-13 21:18:3...

Letter

Count 0
Lowercase Letter 0
Space Separator 99441
Uppercase Letter 0
Dash Punctuation 198882
Decimal Number 1392174
  • order_purchase_timestamp contains many words: 51452 words
  • The largest value (20171124) is over 2.36 times larger than the second largest value (20171125)
  • order_purchase_timestamp has words of constant length

order_approved_at

categorical

Approximate Distinct Count 90733
Approximate Unique (%) 91.4%
Missing 160
Missing (%) 0.2%
Memory Size 8339604

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2017-10-02 11:07:1...
2nd row 2018-07-26 03:24:2...
3rd row 2018-08-08 08:55:2...
4th row 2017-11-18 19:45:5...
5th row 2018-02-13 22:20:2...

Letter

Count 0
Lowercase Letter 0
Space Separator 99281
Uppercase Letter 0
Dash Punctuation 198562
Decimal Number 1389934
  • order_approved_at contains many words: 42357 words
  • order_approved_at has words of constant length

order_delivered_carrier_date

categorical

Approximate Distinct Count 81018
Approximate Unique (%) 83.0%
Missing 1783
Missing (%) 1.8%
Memory Size 8203272

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2017-10-04 19:55:0...
2nd row 2018-07-26 14:31:0...
3rd row 2018-08-08 13:50:0...
4th row 2017-11-22 13:39:5...
5th row 2018-02-14 19:46:3...

Letter

Count 0
Lowercase Letter 0
Space Separator 97658
Uppercase Letter 0
Dash Punctuation 195316
Decimal Number 1367212
  • order_delivered_carrier_date contains many words: 37549 words
  • order_delivered_carrier_date has words of constant length

order_delivered_customer_date

categorical

Approximate Distinct Count 95664
Approximate Unique (%) 99.2%
Missing 2965
Missing (%) 3.0%
Memory Size 8103984

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2017-10-10 21:25:1...
2nd row 2018-08-07 15:27:4...
3rd row 2018-08-17 18:06:2...
4th row 2017-12-02 00:28:4...
5th row 2018-02-16 18:17:0...

Letter

Count 0
Lowercase Letter 0
Space Separator 96476
Uppercase Letter 0
Dash Punctuation 192952
Decimal Number 1350664
  • order_delivered_customer_date contains many words: 41744 words
  • order_delivered_customer_date has words of constant length

order_estimated_delivery_date

categorical

Approximate Distinct Count 459
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Memory Size 8353044

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2017-10-18 00:00:0...
2nd row 2018-08-13 00:00:0...
3rd row 2018-09-04 00:00:0...
4th row 2017-12-15 00:00:0...
5th row 2018-02-26 00:00:0...

Letter

Count 0
Lowercase Letter 0
Space Separator 99441
Uppercase Letter 0
Dash Punctuation 198882
Decimal Number 1392174
  • The largest value (000000) is over 190.5 times larger than the second largest value (20171220)
  • order_estimated_delivery_date has words of constant length

Interactions

Correlations

Missing Values